Faster PLSA

نویسندگان

  • Valeriy Avanesov
  • Ilya Kozlov
چکیده

Probabilistic Latent Semantic Analysis (PLSA) is an effective technique for information retrieval, but it has a serious drawback: it consumes a huge amount of computational resources, so it is hard to train this model on a large collection of documents. The aim of this paper is to improve time efficiency of the training algorithm. Two different approaches are explored: one is based on efficient finding of an appropriate initial approximation, the idea of another is that for the most of collection topics may be extracted from relatively small fraction of the data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correlated PLSA for Image Clustering

Probabilistic Latent Semantic Analysis (PLSA) has become a popular topic model for image clustering. However, the traditional PLSA method considers each image (document) independently, which would often be conflict with the real occasion. In this paper, we presents an improved PLSA model, named Correlated Probabilistic Latent Semantic Analysis (C-PLSA). Different from PLSA, the topics of the gi...

متن کامل

A New Bigram-PLSA Language Model for Speech Recognition

A novel method for combining bigram model and Probabilistic Latent Semantic Analysis (PLSA) is introduced for language modeling. The motivation behind this idea is the relaxation of the “bag of words” assumption fundamentally present in latent topic models including the PLSA model. An EM-based parameter estimation technique for the proposed model is presented in this paper. Previous attempts to...

متن کامل

Learning spoken document similarity and recommendation using supervised probabilistic latent semantic analysis

This paper presents a model-based approach to spoken document similarity called Supervised Probabilistic Latent Semantic Analysis (PLSA). The method differs from traditional spoken document similarity techniques in that it allows similarity to be learned rather than approximated. The ability to learn similarity is desirable in applications such as Internet video recommendation, in which complex...

متن کامل

Multilabel Image Annotation Based on Double-Layer PLSA Model

Due to the semantic gap between visual features and semantic concepts, automatic image annotation has become a difficult issue in computer vision recently. We propose a new image multilabel annotation method based on double-layer probabilistic latent semantic analysis (PLSA) in this paper. The new double-layer PLSA model is constructed to bridge the low-level visual features and high-level sema...

متن کامل

Bayesian learning for latent semantic analysis

Probabilistic latent semantic analysis (PLSA) is a popular approach to text modeling where the semantics and statistics in documents can be effectively captured. In this paper, a novel Bayesian PLSA framework is presented. We focus on exploiting the incremental learning algorithm for solving the updating problem of new domain articles. This algorithm is developed to improve text modeling by inc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014